Resource Library: Multimedia

home *** CD-ROM | disk | FTP | other *** search

/ Resource Library: Multimedia / Resource Library: Multimedia.iso / hypertxt / msdos / hypdiss / app-d < prev next >

Wrap

Text File | 1992-04-02 | 29KB | 633 lines

APPENDIX D. HYPERTEXT INFORMATION ACCESS STUDY INTERVIEW SUMMARY NEIL LARSON BERKELEY, CAL. MARCH 11, 1991 A. HYPERTEXT ARCHIVE TRANSACTION/SUPPORT SYSTEM: A.1. Please summarize the basic hypertext document content assembly & maintenance procedures. The database consists of accounting and auditing information converted to a hypertext format for Deloitte & Touche. A Deloitte & Touche group has defined the strategic plan for the database. This includes content, maintenance, plans for expansion. Deloitte & Touche takes care of providing all material for the database. ** The information arrives almost entirely in hard copy printed form, and must be converted to electronic format. The current operation uses a Kurzweil OCR unit for conversion processing, and can convert approximately 500 pages per day, if needed, with two people working. One person handles physical processing of the OCR unit; the other handles spell- checking, error correction, and initial text formatting. ** Document inspection, analysis, and converting to screen format suitable for hypertext is the next step. The major task is breaking a linear document into separate hierarchical sections. This includes several subtasks. First, the screen format is adapted for best display and user comprehension. The document must be divided into multiple short sections, attempting to form logical text units or hypertext nodes covering a single topic, within a preferred maximum of one screen length. As source text is split up into hypertext nodes, the author embeds links to other relevant sections, and "continuity links" to previous and next sections. ** The author must also insert links which fit the total document into the system's overall conceptual hierarchy. He is continually revising and redefining that conceptual hierarchy. MaxThink hypertext links are designated in the form of the target MS-DOS filename surrounded by angle bracket characters. This link convention is case-insensitive. E.g., they can be in either following form: <filename> <FILENAME>. A.2. GENERAL PRESENTATION AND PRODUCTION DESIGN OF THE HYPERTEXT ACCESS SYSTEM: A.2.a. Describe the general arrangement of the main document file. (Unique document identification, general logical arrangement, basic principle of access) MaxThink elected usage of straight MS-DOS ASCII text files for basic node text storage. The text files are directly accessible by the combination of subdirectory name and file name. MS-DOS file retrieval performance seriously degrades if there are substantially more than 100 files in a disk subdirectory. They solved this limitation by using a system of hierarchical or specialized subdirectories, limiting each to approximately 100 files. They use a general classification approach in assigning files to subdirectories. [NOTE: Subdirectory approach is also covered in the Phillips interview notes.] They use file-naming conventions to produce unique text filenames. These standardized names may reflect a combination of factors, such as source of information, document type, time/date of publication, source file section, etc. The conventions are generally mnemonic, so users easily learn the coding, and can predict file content. A.2.b. Please summarize the general concepts of the system's "user interface," the document access and display methods, design of the presentation means, etc. Larson says the design goal was to arrange information in a clear, simple, method, so that people can find it. They developed a hypertext presentation mechanism which they feel is intuitively obvious. They also attempted to design powerful hierarchy and indexing approaches, so the material would be accessible from many different viewpoints. The interface design is based almost entirely upon use of the four cursor arrow keys. The arrows are a metaphor for "jumping" to another location in the information base. The up and down cursor arrows select from links displayed on the screen; the right cursor arrow executes the jump; the left cursor arrow backtracks to the origin or "jump-off" location. Larson says, "This hypertext navigation metaphor is so simple that it takes a user about 30 seconds to learn. It is complemented by providing an effective hierarchical system of networked menus, in combination with an indexing system." Both approaches use "embedded menus." These feature obvious, eye- readable, hypertext links, used along with a clear and obvious menu structure, or descriptive surrounding text. He goes on, "Our menus attempt to build a conceptual structure of the topic. We use metaphors to express the thought patterns or structures relating to the topic. We intend to express the domain structure with such memorable, obvious, metaphors, that users will adopt the structure; that it becomes their structure." A.2.c. Identify and briefly describe the general production tools or building tools used in construction of the system. Larson describes their approach to hypertext construction as generally building the system out of nodes or fragments of information, which have been "decomposed" from original printed documents. He notes the necessity for identifying the information content in the nodes, and linking or sequencing them into a meaningful, communicative, knowledge structure. He describes the use three major tools for building the hypertext system. They include an editor, used for formatting and insertion of links; an outliner, used to form hypertext hierarchies; and a matrix outliner, or network builder, used to create complex hypertext networks. (More fully described in the next section.) He feels that these three tools give them the ability to construct three powerful and complementary approaches. He describes these as: * Taxonomic approach - using hierarchies * Linguistic approach - using the glossary index * Hypertext network - using the complex interconnected networks." He also mentions the use of various utility programs, described in next section. A.2.d. Identify and briefly describe the specialized organizational and quality control tools which allow you to build the system. ** "TransText" - the hypertext word processor. They feel this editor to be the most important tool. It is used for formatting, editing, "splitting up" or breaking the file into nodes, and for insertion of hypertext links. It thus handles both transformation of the file information into effective, communicative, display format, as well as the insertion of the links themselves. ** "MaxThink" - outliner, used as the major hierarchical tool. It can create classes, sequence, boundaries, and hierarchies (with inheritance). It is used to create logical structures or metaphors of the information domain, which can automatically generate hypertext hierarchies. ** "Houdini" - network-building tool. This program is a matrix outliner, and can build "3-dimensional" outlines, where any node can be connected to any other node. These networks can also interconnect to and within other networks. Again, the Houdini matrix networks can automatically generate hypertext networks. The network headings also generate a KWOC "glossary" index, which is always instantly available to the user. Larson pointed out that they also use a number of specialized utility programs, for specialized editing and control functions. Some examples are: ** REFALL - shows all hypertext jumps FROM a file. Good for analyzing patterns of hypertext linkage. ** INVERT - shows all hypertext jumps TO a file. Good for analyzing patterns of hypertext linkage. ** CONNECT - shows all generations of input and output links to a group of specified files. Good for analyzing patterns of hypertext linkage. ** LINE - creates a <linked> list of all hypertext source text nodes, including title line or descriptive first line of text. List can be used with the TransText editor, or imported into the MaxThink outliner or Houdini matrix outliner. Good for identification and network incorporation of text content nodes. ** IC - (Integrity Checker) used to check for blind references to non-existent files, for link name errors. ** Glossary building utilities - produces an "online index" to network nodes and file titles, presented in KWOC format. Exercises depluralization, synonym control, and sorting of index entries by source document type. B. THE HYPERTEXT INFORMATION ACCESS SYSTEM: B.1. ACCESS POINTS - Which of the following types of access points are included in your system? For each question item, please rate using the following categories, and comments as needed... P)resent,E)asily achievable,M)odifications needed,N)ot achievable B.1.a. Main file sequence - direct file access Category: [P] E M N Hypertext nodes retrievable by ASCII file name. B.1.b. Author Category: [P] E M N Editorial decision. Author indexing is included in DaTa, in many instances. B.1.c. Title Category: [P] E M N Editorial decision. Included in DaTa, in many cases. B.1.d. Name forms Category: [P] E M N Editorial decision. Optionally included. B.1.d.i. Personal names Category: [P] E M N Editorial decision. Optionally included. B.1.d.ii. Corporate names (Companies, organizations, government, etc.) Category: [P] E M N Editorial decision. Optionally included. B.1.e. Keywords Category: [P] E M N Keyword access through "Glossary" KWOC index. B.1.f. Subject/Topic/Concept Category: [P] E M N Via hierarchy, network, and KWOC index. B.1.g. Geographic Category: [P] E M N Editorial decision. Optionally included. Present in DaTa as part of hierarchy. B.1.h. Date, chronological, temporal Category: [P] E M N Editorial decision. Optionally included. Present in DaTa as part of hierarchy, as well as in filename conventions. B.1.i. Language Category: [P] E M N This is purely an editorial decision, the capability is present. Minor software modifications may be needed, to handle ASCII extended character set for foreign languages. B.1.j. Document format - book, article, pamphlet, report, etc. Category: P [E] M N Editorial decision. Optionally included. B.1.k. Document position - section, page, location Category: [P] E M N Editorial decision. Can optionally be included as part of hierarchy. This would be labor-intensive. It would be most efficient to add this as a link call to an external searching program, with the ability to handle positional or string specifications. B.1.l. Automated field specifications - record size, entry date, notations, originator, etc. Category: [P] E M N MaxThink utilities include a string-searching program, callable from embedded hypertext link. The hypertext links can similarly call any external DOS program. [The investigator, for example, has built a system with link calls to the Zyindex text search & retrieval program. The Zyindex index file allowed full-text search of the entire hypertext database, in addition to regular hypertext links.] B.2 ACCESS APPROACHES - Which of the following subject or topical information devices are used in your system? For each question item, please rate using the following categories, and comments as needed... P)resent,E)asily achievable,M)odifications needed,N)ot achievable B.2.a. Classification schemes B.2.a.i. Hierarchical taxonomy Category: [P] E M N Yes, we view the generated hierarchy and linked network as a classification scheme, more flexible and powerful than the standard linear taxonomy. B.2.a.ii. Enumerative, universal, classification [Dewey type classification] Category: P [E] M N Editorial decision. Optionally included. Any classification can be embedded or expressed in the hypertext hierarchy. B.2.a.iii. Specialized, literary warrant, classification [Library of Congress, Reader Interest Classification] Category: P [E] M N Editorial decision. Optionally included. Any classification can be embedded or expressed in the hypertext hierarchy. B.2.a.iv. Faceted classification (analytico-synthetic) [PRECIS style of indexing] [C., p.65] Category: P [E] M N Editorial decision. Optionally included. Any classification can be embedded or expressed in the hypertext hierarchy. B.2.b. Indexing approaches B.2.b.i. Alphabetical index, separate or dictionary file Category: [P] E M N Present. B.2.b.i.A. Keywords, extracted or assigned Category: [P] E M N Have utilities for term extraction, will be developing further. the KWOC index utility rotates assigned network headings or file title words. B.2.b.i.B. Controlled vocabulary assignment Category: P [E] M N Editorial decision. Optionally included. At present, the KWOC index utility optionally rotates either assigned network headings or file title phrases. B.2.b.i.C Relative index, e.g., to Dewey classification Category: P [E] M N Editorial decision. Optionally included via taxonomy. B.2.b.ii. Term manipulation indexes (generally for production of printed output) Category: [P] E M N An integral part of the system. B.2.b.ii.A. Simple permuted or rotated - KWIC Category: P [E] M N Editorial decision. Optionally included. B.2.b.ii.B. Ordered by extracted element - KWOC Category: [P] E M N An integral part of the system. B.2.b.ii.C. String indexing (phrase-manipulation, rotation of terms) - PRECIS, NEPHIS, etc. Category: P [E] M N Editorial decision. Optionally included. Achievable by creating index with external utility, then importing into taxonomy form. B.2.b.ii.D. Chain indexing (string indexing, with forms reflecting basic taxonomy of terms [C., p. 67] Category: P [E] M N Editorial decision. Optionally included. Achievable by creating index with external utility, then importing into taxonomy form. B.2.b.iii. Classified index (generally requires secondary alphabetical index, for ease of use) [C., p. 56] Category: P [E] M N Editorial decision. Optionally included. Achievable by creating index with external utility, then importing into taxonomy form. B.2.b.iv. Coordinate indexing - Manual coordination or automated database file, using Boolean search [C., p. 60] Category: P [E] M N Editorial decision. Optionally included. Achievable by call to external program. B.2.b.iv.A. Older non-automated searching methods - peekaboo, edge-notched cards, Uniterm terminal digit cards Category: P E M [N] Not applicable. This system does not use a hard copy format file record. B.2.b.iv.B. Database file search - Sequential or indexed field search Category: P [E] M N Editorial decision. Optionally included. Achievable by call to external program. B.2.b.iv.C. Full text search Category: [P] E M N During the interview, and elsewhere, Larson voices strong subjective disapproval of this information retrieval approach (Fersko-Weiss 1991). Nevertheless, MaxThink provides SEARCH and CD-INDEX, two program modules which provide this option. This is an editorial decision; the text-searching feature may be optionally included. The hypertext links can also call other, more powerful, string-searching programs. An example is National Legal Research Systems' Qwik- Rules (TM) legal rules hypertext information system. They used MaxThink hypertext software to build the system, and provide links to QWIKFIND, their own text- searching engine. As elsewhere mentioned, the investigator himself has also built systems with link calls to Zyindex, Golden Retriever, Power Search, and other text-searching programs. B.2.b.v. Faceted indexing [C., p 65] Category: P [E] M N Editorial decision. Optionally included. Achievable by creating index with external utility, then importing into taxonomy form. B.2.b.vi. Citation indexing [C., p. 72] Category: P [E] M N Editorial decision. Optionally included. Achievable by creating index with external utility, then importing into taxonomy form. B.3. CONTROL MECHANISMS - Which of the following subject access control measures, intended to control consistency, form, and item sequencing, are present in your system? For each question item, please rate using the following categories, and comments as needed... P)resent,E)asily achievable,M)odifications needed,N)ot achievable B.3.a. Classification schedule Category: [P] E M N The hierarchical taxonomy is equivalent to a flexible classification schedule, in our opinion. B.3.b. Vocabulary control systems Category: [P] E M N Editorial decision. Optionally included. Our Glossary utility presently uses controls on form of entry, e.g., depluralization, (singular preferred), synonym cross-references, stopword lists for the KWOC index, automatic sorting by entry type. We are also considering automatic word-stemming for the KWOC index. B.3.b.i. Authority/Headings files Category: P [E] M N Editorial decision. Optionally included. Achievable by external manual or automated means. B.3.b.ii. Thesaurus control Category: P [E] M N Editorial decision. Optionally included. Achievable by external manual or automated means. B.3.b.iii. Derived-term methods or algorithms Category: P [E] M N The DaTa operation already uses term extraction utilities for analyzing files and groups of files. MaxThink is considering developing more advanced term extraction utilities, based on word frequency, per Miranda Pao. This could also be achieved by using third-party software for index term extraction. B.3.b.iv. Hierarchical search thesaurus (for database file search) Category: P E M [N] This approach is not currently used, nor realistic, since the primary approach is not a "searching" methodology. If editorial decision mandates, authors could achieve this via link call to external searching program with this capability. E.g., Zyindex, MicroBASIS. B.3.b.v. Entry term form control mechanisms Category: [P] E M N Editorial decision. Optionally included. Achievable externally, using manual or automated means. B.3.b.v.A. Entry syntax (preferred noun/adjective, etc., construction form) Category: [P] E M N Present approach entirely a matter of editorial policy control. E.g., the DaTa CD-ROM product operates with preferred usages. B.3.b.v.B. Standard number approach (plural, singular form preference) Category: [P] E M N Present DaTa approach uses singular-preferred, uses depluralization in the glossary KWOC utility. B.3.b.v.C. Automatic depluralization (database file) Category: P [E] M N Not applicable using the associative linking approach. Depluralization can be implemented in hypertext index representations. The present DaTa approach uses singular-preferred, uses depluralization in the glossary KWOC utility. As an alternative, an author can also use links to external database software with this capability B.3.b.v.D. Synonym definition (database file) Category: [P] E M N This is an editorial decision. The KWOC glossary utility program includes automatic synonym handling, cross-references, etc., for construction of the KWOC index. B.3.c. "Standard Subdivision" or faceted classification protocol Category: [P] E M N Use standard extensions in filename conventions for document types; also use standard coding to reflect document types in network/glossary files. This also results in sorting by document or node type in the KWOC index. B.3.d. Term or descriptor relationships - Roles, links, weighting Category: P [E] M N Not currently used, nor realistic, since the primary approach is not a "searching" methodology. If editorial decision mandated, could achieve by link call to external searching programs with this capability. B.3.e. Filing or sorting rules Category: [P] E M N For convenience, they currently use straight ASCII sort for the KWOC index, with sub-sorts by document or node type. The network taxonomy certainly reflects a subjective, author-imposed, ordering or hierarchy. Any other sorting sequence for the KWOC could be supported with the correct algorithm for the external sorting utility. B.3.f. Manual or automated authority/procedural safety measures Category: [P] E M N Full set of utilities, described above, for checking linking patterns, clustering, link name spelling errors, blind references, file text contents, etc. In addition, the production team uses full normal computer operating approaches to backup files, off- site copies, working copy backups, etc.